A Multi-Dimensional Evaluation of Synthetic Data Generators

نویسندگان

چکیده

Synthetic datasets are gradually emerging as solutions for data sharing. Multiple synthetic generators have been introduced in the last decade fueled by advancement machine learning and increased demand fast inclusive sharing, yet their utility is not well understood. Prior research tried to compare of using different evaluation metrics. These metrics found generate conflicting conclusions making direct comparison very difficult. This paper identifies four criteria (or dimensions) masked classifying available into categories based on measure they attempt preserve: attribute fidelity, bivariate population application fidelity. A representative metric from each category chosen popularity consistency, used overall recent synthesizers across 19 sizes feature counts. The also examines correlations between selected an streamline utility.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Synthetic Generators for Cloning Social Network Data

Synthetic social network generators are useful for a variety of purposes, including benchmarking algorithms, modeling human interactions within agent-based simulations, and debugging code. Despite the increased availability of social media data, collecting data directly from these networks is not always feasible due to privacy concerns. Often data access is restricted to “silos” of analysts wit...

متن کامل

Re-Identification and Synthetic Data Generators: A Case Study

Synthetic generators are increasingly used to replace sensitive data with artificial data preserving to a predetermined extent the utility of the original data. When using synthetic data generators, re-identification analysis is usually disregarded on the grounds that, the released data being artificial, no real re-identification is possible. While this may be reasonable if synthetic generation...

متن کامل

Distributed Searching of Multi-dimensional Data: A Performance Evaluation Study

In this paper we present a data structure for searching in multi-dimensional point sets in distributed environments and discuss its experimental evaluation also through a comparison with previous proposals. The data structure is based on an extension of k-d trees. The technological reference context is a distributed environment where multicast (i.e., restricted broadcast) is allowed, but it is ...

متن کامل

A method for 2-dimensional inversion of gravity data

Applying 2D algorithms for inverting the potential field data is more useful and efficient than their 3D counterparts, whenever the geologic situation permits. This is because the computation time is less and modeling the subsurface is easier. In this paper we present a 2D inversion algorithm for interpreting gravity data by employing a set of constraints including minimum distance, smoothness,...

متن کامل

Visualizing Multi-Dimensional Data

High dimensional data visualization is very important in data analysts since it gives a direct and natural view of data. In this paper, we propose a method to visualize large amount of high dimensional data in a 3-D space. In our method, we divide the high dimension data into several groups of lower dimensional data first. Then, we use different icons to represent different groups. Initial expe...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2022

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2022.3144765